Exploring Semantic Information in Hindi WordNet for Hindi Dependency Parsing
نویسندگان
چکیده
In this paper, we present our efforts towards incorporating external knowledge from Hindi WordNet to aid dependency parsing. We conduct parsing experiments on Hindi, an Indo-Aryan language, utilizing the information from concept ontologies available in Hindi WordNet to complement the morpho-syntactic information already available. The work is driven by the insight that concept ontologies capture a specific real world aspect of lexical items, which is quite distinct and unlikely to be deduced from morpho-syntactic information such as morph, POS-tag and chunk. This complementing information is encoded as an additional feature for data driven parsing and experiments are conducted. We perform experiments over datasets of different sizes. We achieve an improvement of 1.1% (LAS) when training on 1,000 sentences and 0.2% (LAS) on 13,371 sentences over the baseline. The improvements are statistically significant at p<0.01. The higher improvements on 1,000 sentences suggest that the semantic information could address the data sparsity problem.
منابع مشابه
Exploring the effects of Sentence Simplification on Hindi to English Machine Translation System
Even though, a lot of research has already been done on Machine Translation, translating complex sentences has been a stumbling block in the process. To improve the performance of machine translation on complex sentences, simplifying the sentences becomes imperative. In this paper, we present a rule based approach to address this problem by simplifying complex sentences in Hindi into multiple s...
متن کاملAn Insight into Role of Wordnet and Language Network for effective IR from Hindi Text Documents
This paper investigates the limitations of traditional Information Retrieval (IR) models and how the semantic based approaches overcomes these limitations. Further the paper analyzes a range of aspects of language network representation of text corpus and how different network properties can lead to improve the results for different applications of IR. The paper analyzes Hindi Wordnet to exploi...
متن کاملExploring self training for Hindi dependency parsing
In this paper we explore the effect of selftraining on Hindi dependency parsing. We consider a state-of-the-art Hindi dependency parser and apply self-training by using a large raw corpus. We consider two types of raw corpus, one from same domain as of training and testing data and the other from different domain. We also do an experiment, where we add small gold-standard data to the training s...
متن کاملExploring Self-training and Co-training for Dependency Parsing
We explore the effect of self-training and co-training on Hindi dependency parsing. We use Malt parser, which is a state-ofthe-art Hindi dependency parser, and apply self-training using a large unannotated corpus. For co-training, we use MST parser with comparable accuracy to the Malt parser. Experiments are performed using two types of raw corpora— one from the same domain as the test data and...
متن کاملExpansion of the First Hindi-Nepali Word-Net Based Bi-Lingual Dictionary and the advancement of the Human-Machine Interface
Natural Language Processing is introducing a new era in the field of Computer Science and Machine translation. HumanMachine interaction is to play a very important role in the coming centuries as the dependency of human over the machine is increasing variably. Word-Net was first introduced by Miller and Fellbaum in 1985. WordNet is a Lexical database for the Human Languages. It groups the Human...
متن کامل